Search CORE

70 research outputs found

ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

Author: Guo Yipin
Lin
Shi Huihong
Yingyan
You Haoran
Publication venue
Publication date: 21/09/2023
Field of study

Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. But both attention and multi-layer perceptions (MLPs) in ViTs are not efficient enough due to dense multiplications, resulting in costly training and inference. To this end, we propose to reparameterize the pre-trained ViT with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed

\textbf{ShiftAddViT}

, which aims for end-to-end inference speedups on GPUs without the need of training from scratch. Specifically, all

\texttt{MatMuls}

among queries, keys, and values are reparameterized by additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized by shift kernels. We utilize TVM to implement and optimize those customized kernels for practical hardware deployment on GPUs. We find that such a reparameterization on (quadratic or linear) attention maintains model accuracy, while inevitably leading to accuracy drops when being applied to MLPs. To marry the best of both worlds, we further propose a new mixture of experts (MoE) framework to reparameterize MLPs by taking multiplication or its primitives as experts, e.g., multiplication and shift, and designing a new latency-aware load-balancing loss. Such a loss helps to train a generic router for assigning a dynamic amount of input tokens to different experts according to their latency. In principle, the faster experts run, the larger amount of input tokens are assigned. Extensive experiments consistently validate the effectiveness of our proposed ShiftAddViT, achieving up to \textbf{5.18\times} latency reductions on GPUs and \textbf{42.9%} energy savings, while maintaining comparable accuracy as original or efficient ViTs.Comment: Accepted by NeurIPS 202

arXiv.org e-Print Archive

NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants

Author: Fu Yonggan
Lin Yingyan
You Haoran
Yu Zhongzhi
Yuan Jiayi
Publication venue
Publication date: 23/06/2023
Field of study

Tiny deep learning has attracted increasing attention driven by the substantial demand for deploying deep learning on numerous intelligent Internet-of-Things devices. However, it is still challenging to unleash tiny deep learning's full potential on both large-scale datasets and downstream tasks due to the under-fitting issues caused by the limited model capacity of tiny neural networks (TNNs). To this end, we propose a framework called NetBooster to empower tiny deep learning by augmenting the architectures of TNNs via an expansion-then-contraction strategy. Extensive experiments show that NetBooster consistently outperforms state-of-the-art tiny deep learning solutions

arXiv.org e-Print Archive

NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation

Author: Dass Jyotikrishna
Fu Yonggan
Lin
Wu Shang
Yingyan
You Haoran
Zhang Shunyao
Publication venue
Publication date: 24/10/2023
Field of study

Boosting the task accuracy of tiny neural networks (TNNs) has become a fundamental challenge for enabling the deployments of TNNs on edge devices which are constrained by strict limitations in terms of memory, computation, bandwidth, and power supply. To this end, we propose a framework called NetDistiller to boost the achievable accuracy of TNNs by treating them as sub-networks of a weight-sharing teacher constructed by expanding the number of channels of the TNN. Specifically, the target TNN model is jointly trained with the weight-sharing teacher model via (1) gradient surgery to tackle the gradient conflicts between them and (2) uncertainty-aware distillation to mitigate the overfitting of the teacher model. Extensive experiments across diverse tasks validate NetDistiller's effectiveness in boosting TNNs' achievable accuracy over state-of-the-art methods. Our code is available at https://github.com/GATECH-EIC/NetDistiller

arXiv.org e-Print Archive

Experimental study on the mechanical controlling factors of fracture plugging strength for lost circulation control in shale gas reservoir

Author: Jing Haoran
Kang Yili
Xu Chengyuan
Xu Feng
You Zhenjiang
Zhu Lingmao
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2023
Field of study

The geological conditions of shale reservoir present several unique challenges. These include the extensive development of multi-scale fractures, frequent losses during horizontal drilling, low success rates in plugging, and a tendency for the fracture plugging zone to experience repeated failures. Extensive analysis suggests that the weakening of the mechanical properties of shale fracture surfaces is the primary factor responsible for reducing the bearing capacity of the fracture plugging zone. To assess the influence of oil-based environments on the degradation of mechanical properties in shale fracture surfaces, rigorous mechanical property tests were conducted on shale samples subsequent to their exposure to various substances, including white oil, lye, and the filtrate of oil-based drilling fluid. The experimental results demonstrate that the average values of the elastic modulus and indwelling hardness of dry shale are 24.30 GPa and 0.64 GPa, respectively. Upon immersion in white oil, these values decrease to 22.42 GPa and 0.63 GPa, respectively. Additionally, the depth loss rates of dry shale and white oil-soaked shale are determined to be 57.12% and 61.96%, respectively, indicating an increased degree of fracturing on the shale surface. White oil, lye, and the filtrate of oil-based drilling fluid have demonstrated their capacity to reduce the friction coefficient of the shale surface. The average friction coefficients measured for white oil, lye, and oil-based drilling fluid are 0.80, 0.72, and 0.76, respectively, reflecting their individual weakening effects. Furthermore, it should be noted that the contact mode between the plugging materials and the fracture surface can also lead to a reduction in the friction coefficient between them. To enhance the bearing capacity of the plugging zone, a series of plugging experiments were conducted utilizing high-strength materials, high-friction materials, and nanomaterials. The selection of these materials was based on the understanding of the weakened mechanical properties of the fracture surface. The experimental results demonstrate that the reduced mechanical properties of the fracture surface can diminish the pressure-bearing capacity of the plugging zone. However, the implementation of high-strength materials, high-friction materials, and nanomaterials effectively enhances the pressure-bearing capacity of the plugging zone. The research findings offer valuable insights and guidance towards improving the sealing pressure capacity of shale fractures and effectively increasing the success rate of leakage control measures during shale drilling and completion. © 2023 The Author

Research Online @ ECU

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference

Author: Dai Xiaoliang
Fan Haoqi
Lin Yingyan
Vajda Peter
Wu Bichen
Xiong Yunyang
You Haoran
Zhang Peizhao
Publication venue
Publication date: 03/04/2023
Field of study

Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens. Existing efficient ViTs adopt local attention (e.g., Swin) or linear attention (e.g., Performer), which sacrifice ViTs' capabilities of capturing either global or local context. In this work, we ask an important research question: Can ViTs learn both global and local context while being more efficient during inference? To this end, we propose a framework called Castling-ViT, which trains ViTs using both linear-angular attention and masked softmax-based quadratic attention, but then switches to having only linear angular attention during ViT inference. Our Castling-ViT leverages angular kernels to measure the similarities between queries and keys via spectral angles. And we further simplify it with two techniques: (1) a novel linear-angular attention mechanism: we decompose the angular kernels into linear terms and high-order residuals, and only keep the linear terms; and (2) we adopt two parameterized modules to approximate high-order residuals: a depthwise convolution and an auxiliary masked softmax attention to help learn both global and local information, where the masks for softmax attention are regularized to gradually become zeros and thus incur no overhead during ViT inference. Extensive experiments and ablation studies on three tasks consistently validate the effectiveness of the proposed Castling-ViT, e.g., achieving up to a 1.8% higher accuracy or 40% MACs reduction on ImageNet classification and 1.2 higher mAP on COCO detection under comparable FLOPs, as compared to ViTs with vanilla softmax-based attentions.Comment: CVPR 202

arXiv.org e-Print Archive

ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

Author: Li Baopu
Li Chaojian
Lin Yingyan
Shi Huihong
Sun Zhanyi
You Haoran
Yu Zhongzhi
Zhang Yongan
Zhao Yang
Publication venue
Publication date: 10/12/2022
Field of study

Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision tasks. However, ViTs' self-attention module is still arguably a major bottleneck, limiting their achievable hardware efficiency. Meanwhile, existing accelerators dedicated to NLP Transformers are not optimal for ViTs. This is because there is a large difference between ViTs and NLP Transformers: ViTs have a relatively fixed number of input tokens, whose attention maps can be pruned by up to 90% even with fixed sparse patterns; while NLP Transformers need to handle input sequences of varying numbers of tokens and rely on on-the-fly predictions of dynamic sparse attention patterns for each input to achieve a decent sparsity (e.g., >=50%). To this end, we propose a dedicated algorithm and accelerator co-design framework dubbed ViTCoD for accelerating ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the attention maps to have either denser or sparser fixed patterns for regularizing two levels of workloads without hurting the accuracy, largely reducing the attention computations while leaving room for alleviating the remaining dominant data movements; on top of that, we further integrate a lightweight and learnable auto-encoder module to enable trading the dominant high-cost data movements for lower-cost computations. On the hardware level, we develop a dedicated accelerator to simultaneously coordinate the enforced denser/sparser workloads and encoder/decoder engines for boosted hardware utilization. Extensive experiments and ablation studies validate that ViTCoD largely reduces the dominant data movement costs, achieving speedups of up to 235.3x, 142.9x, 86.0x, 10.1x, and 6.8x over general computing platforms CPUs, EdgeGPUs, GPUs, and prior-art Transformer accelerators SpAtten and Sanger under an attention sparsity of 90%, respectively.Comment: Accepted to HPCA 202

arXiv.org e-Print Archive